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Abstract 

In  this  paper,  it  is  shown  that  a  continuum  of  distributions  best  characterizes  the  hidden 
layer  outputs  of  a  multilayer  perceptron  when  trained  as  a  0-1  classifier  and  tested  with  a  range 
of  signal- to-noise  ratio  (SNR)  input  distributions.  A  four  parameter  system  of  transformed 
normal  distributions,  known  as  the  Johnson  system  of  distributions,  is  utilized  to  illustrate  the 
shape  of  output  distributions  as  a  function  of  input  SNR  levels. 


1  Introduction 

In  this  paper,  a  feedforward  multilayer  perceptron  trained  as  0-1  classifier  with  backpropagation  of  error 
is  considered.  It  will  be  shown  that  the  Johnson  system  of  continuous  distributions  [1]  can  be  used  to 
characterize  the  continuum  of  signal  to  noise  ratio  (SNR)  in  terms  of  the  third  and  fourth  order  moments  of 
the  distribution  of  pre-squashed  neuron  outputs  (i.e.,  weighted  sums  of  neuron  inputs). 

The  Johnson  system  of  distributions  is  generated  by  transformations  of  the  form 

Z  =  j  +  rjk{x-,X,€), 

where  Z  is  a  standard  normal  variate.  The  parameters  e  and  A  are  location  and  scale  parameters,  respectively, 
while  ij  and  7  are  shape  parameters.  Johnson  [2]  suggested  the  following  three  functions  k  to  cover  a  wide 
range  of  possible  shapes: 
fcx  defines  the  Su  distribution,  where 


ki(x;A,e)  =  sink  1 
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&2  defines  the  Sb  distribution,  where 

and  XC3  defines  the  Sj,  distribution,  where 

ksix.X.e)  =  fn  • 

The  5b  family  is  bounded  on  (e,e  +  A)  and  the  Su  family  is  unbounded  on  («,e  +  A),  where  e  >  r 
and  A  >  0.  The  Sl  distributions  divide  the  skewness-by-kurtosis  plane  into  two  regions  such  that  the  Sb 
distributions  lie  in  one  of  the  regions  and  the  Su  distributions  lie  in  the  other. 

2  Fitting  Johnson  Distributions  by  the  Method  of  Quantile  Match¬ 
ing 

Figure  1  presents  the  skewness-by-kurtosis  plane  for  a  set  of  distributions  of  weighted  sums  of  squashed 
hidden  layer  outputs.  As  noted  on  the  plot,  for  each  distribution,  the  relative  signal  level  (RSL)  of  the  signals 
varies  from  0  dB  to  -6  JB  in  decrements  of  2  dB;  the  skewness  and  kurtosis  of  the  noise  only  distribution 
is  also  noted  on  the  plot.  This  plot  clearly  indicates  the  relationship  of  the  skewness  and  kurtosis  of  these 
distributions  to  the  SNR.  As  the  RSL  (i.e.,  SNR)  decreases,  the  skewness  ranges  from  negative  to  positive 
values  while  at  the  same  time,  the  kurtosis  first  decreases  when  RSL  is  about  -4  dB  and  then  increases  as 
the  RSL  continues  to  decrease. 

The  parameters  rj,y,  A,  <  are  estimated  by  a  refinement  of  the  method  of  quantile  matching  as  described 
by  Slifker  and  Shapiro  [3].  Let  *i, ...,  xn  be  the  given  sample. 

(1)  For  a  given  unit  normal  quantile  z,  calculate  pz  =  P(Z  <  z),p3i  =  P(Z  <  3z),  where  Z  is  a  standard 
normal  variate.  Set  p_,  =  1  —  pz,p~ a*  =  1  —  pax- 

(2)  From  the  data,  calculate  the  sample  quantiles  Q(p(),  C  =  +z,:f3z,  as  follows:  Calculate  i  =  Np ( 
then  Q(p<)  =  Z(i),  where  is  the  itk  ordered  observation  in  the  sample.  Since  i  in  general  will  not  be  an 
integer,  it  will  be  necessary  to  interpolate. 

(3)  Calculate  p  =  Q{pz)  -  Q(p-Z),m  =  <3(pa»)  -  Q(pz),n  =  Q(p~z)  -  Q(p~ a.)-  We  then  check  c  = 

If  c>  1,  then  use  the  Su  parameters. 

If  c<  1,  then  use  the  Sb  parameters. 

If  c  =  1,  then  use  the  Si  parameters. 

(4)  The  formulas  for  the  estimates  of  the  parameters  77,7,  A,  and  e  are  given  in  Reference  [3]  and  will  not  be 
repeated  here. 

For  an  example  distribution,  Figure  2(a)  shows  the  values  of  c  as  a  function  of  the  range  of  z.  Here 
2  ranges  from  0.01  to  0.80,  in  increments  of  Az=0.01.  Figure  2(b)  shows  the  corresponding  Kolmogorov- 
Smirnov  (KS)  distance  measuring  the  goodness  of  fit  of  the  calculated  Johnson  distribution  to  the  sample 
data.  The  authors  have  observed  that  these  results  depend  on  the  resolution  of  the  step  size  in  z.  Under  the 
assumption  that  the  quantile  matching  method  places  us  within  the  neighborhood  of  a  better  fit,  a  stochastic 
optimization  procedure  was  applied.  This  procedure  is  described  in  the  next  section. 


IV-716 


1  -{KS  Distance)  c  4f  3  Kurtosis 


jure  1:  Skewness  by  Kurtosis  Values  of  Noise  Only  Distribution  and  Signal  Distributions  Indexed 
RSL. 
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Figure  2:  (a)  Top:  Plot  of  c  =  ^  as  a  Function  of  the  Normal  Quantile  Value  z.  (b)  Bottom:  Plot 
of  1-(KS  Distance)  as  a  Function  of  the  Normal  Quantile  Value  z. 
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Weighted  Sums  of  Sigmoided  Neuron  Outputs 

Figure  3:  Johnson  CDF’s  Overplotted  with  the  Empirical  Distribution  Functions. 

3  Stochastic  Optimization  of  the  Parameter  Estimates 

A  further  refinement  for  obtaining  better  density  fits  is  the  stochastic  optimization  of  the  parameter  estimates 
afforded  by  randomly  perturbing  the  estimates  by  a  small  uniformly  random  quantity.  The  results  of  this 
parameter  optimization  are  shown  in  Figure  3.  Also  noted  are  the  corresponding  probability  values  for  these 
fits.  The  randomly  perturbed  parameters  were  first  evaluated  with  respect  to  the  first  and  last  quantiles  of  the 
data.  If  the  fit  was  within  an  acceptable  range,  the  entropy  of  the  new  distribution  was  then  calculated  using 
the  empirical  data  samples  and  the  new  probability  density  function.  If  the  entropy  of  the  new  distribution 
was  greater  than  that  of  the  best  fitting  distribution,  the  KS  probability  and  KS  distance  were  calculated  for 
the  new  distribution.  If  the  KS  probability  was  within  an  acceptable  range,  the  new  distribution  was  assigned 
to  be  the  parameter  values.  Finally,  for  each  update  of  the  parameters,  the  best  KS  fit  was  separately  saved 
due  to  the  fact  that  the  KS  is  allowed  to  degrade  as  the  entropy  is  maximized.  Typically,  1000  different 
distributions  were  evaluated  with  a  quadratic  or  third  order  cooling  schedule.  This  ad  hoc  technique  was 
developed  for  the  timely  production  of  improved  fits.  Such  fits  were  needed  to  illustrate  the  change  of 
shape  of  output  distributions^  An  alternative  method  which  calculates  maximum  likelihood  estimates  of  the 
Johnson  parameters  is  a  focus  of  our  most  recent  work  [5]. 


4  Conclusions 

As  can  be  seen  in  Figure  4,  the  shape  of  the  weighted  sums  is  a  function  of  the  SNR  of  the  neural  network 
inputs.  For  performance  measures  such  as  ROC  curves  and  Recognition  Differentials,  this  feature  of  neural 
network  classifiers  illustrates  that  the  overlapped  noise  and  signal  distributions  are  at  least  four  parameter 
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Figure  4:  Johnson  Densities  Corresponding  to  the  Fitted  CDF’s. 


distributions.  In  previous  work  (References  [1]  and  [4]),  we  have  shown  that  for  some  cases,  the  two  parameter 
Gaussian  or  three  parameter  lognormal  distributions  can  provide  reasonable  fits.  The  results  of  this  effort 
have  shown  that  at  least  four  parameters  are  generally  needed. 

In  Figure  5,  the  skewness  and  kurtoeis  of  the  empirical  data  distributions  (dotted  line)  is  plotted  with 
the  corresponding  fitted  Johnsons.  Although  empirical  estimates  of  skewness  and  kurtosis  can  be  highly 
variable,  the  plot  of  the  empirical  estimates  illustrates  that  shape  is  a  function  of  SNR.  The  Johnson  fits 
further  illustrate  this  relationship.  Note  that  the  -2  dB,  -4  dB,  -6  dB,  and  Noise  distributions  are  almost 
collinear  with  kurtosis  on  a  log  scale.  Also  note  that  the  -2  dB  distribution  has  the  minimum  kurtosis  and  is 
the  most  symmetric,  i.e.  skewness  is  approximately  zero.  This  observed  relationship  suggests  that  in  some 
cases,  a  change  in  shape  can  be  easily  parametrized  as  a  line  in  the  skewness  versus  log(kurtosis)  plane. 

In  conclusion,  the  shape  of  the  distributions  of  r.euron  outputs  has  been  shown  to  be  a  function  of  the 
SNR  of  the  input  distributions.  For  performance  measures  such  as  ROC  curves  and  Recognition  Differentials 
(RD)  this  means  that  the  change  of  only  mean  and  variance  cannot  be  assumed  for  a  fixed  distribution,  e.g. 
Gaussian.  Furthermore,  a  minimal  parametrization,  i.e.  Johnson  Distributions,  can  provide  RD  estimates 
which  account  for  such  changes  in  shape  due  to  changes  in  input  signal  levels. 
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